Novel Bioinformatics Method for Identification of Genome-Wide Non-Canonical Spliced Regions Using RNA-Seq Data

نویسندگان

  • Yongsheng Bai
  • Justin Hassler
  • Ahdad Ziyar
  • Philip Li
  • Zachary Wright
  • Rajasree Menon
  • Gilbert S. Omenn
  • James D. Cavalcoli
  • Randal J. Kaufman
  • Maureen A. Sartor
چکیده

SETTING During endoplasmic reticulum (ER) stress, the endoribonuclease (RNase) Ire1α initiates removal of a 26 nt region from the mRNA encoding the transcription factor Xbp1 via an unconventional mechanism (atypically within the cytosol). This causes an open reading frame-shift that leads to altered transcriptional regulation of numerous downstream genes in response to ER stress as part of the unfolded protein response (UPR). Strikingly, other examples of targeted, unconventional splicing of short mRNA regions have yet to be reported. OBJECTIVE Our goal was to develop an approach to identify non-canonical, possibly very short, splicing regions using RNA-Seq data and apply it to ER stress-induced Ire1α heterozygous and knockout mouse embryonic fibroblast (MEF) cell lines to identify additional Ire1α targets. RESULTS We developed a bioinformatics approach called the Read-Split-Walk (RSW) pipeline, and evaluated it using two Ire1α heterozygous and two Ire1α-null samples. The 26 nt non-canonical splice site in Xbp1 was detected as the top hit by our RSW pipeline in heterozygous samples but not in the negative control Ire1α knockout samples. We compared the Xbp1 results from our approach with results using the alignment program BWA, Bowtie2, STAR, Exonerate and the Unix "grep" command. We then applied our RSW pipeline to RNA-Seq data from the SKBR3 human breast cancer cell line. RSW reported a large number of non-canonical spliced regions for 108 genes in chromosome 17, which were identified by an independent study. CONCLUSIONS We conclude that our RSW pipeline is a practical approach for identifying non-canonical splice junction sites on a genome-wide level. We demonstrate that our pipeline can detect novel splice sites in RNA-Seq data generated under similar conditions for multiple species, in our case mouse and human.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SparseIso: a novel Bayesian approach to identify alternatively spliced isoforms from RNA-seq data

Motivation Recent advances in high-throughput RNA sequencing (RNA-seq) technologies have made it possible to reconstruct the full transcriptome of various types of cells. It is important to accurately assemble transcripts or identify isoforms for an improved understanding of molecular mechanisms in biological systems. Results We have developed a novel Bayesian method, SparseIso, to reliably i...

متن کامل

Next-Generation Sequencing for Personalized Cardiovascular Disease Care

Next-Generation Sequencing for Personalized Cardiovascular Disease Care Cardiovascular disease (CVD) is the leading cause of death worldwide. Prediction and prevention of CVD, such as coronary artery disease and atherosclerosis, traditionally depend on identification of risk factors. These factors are effective in the general assessment of CVD risk but are not consistent indicators for all indi...

متن کامل

BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS

MOTIVATION Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene fin...

متن کامل

LaSSO, a strategy for genome-wide mapping of intronic lariats and branch points using RNA-seq.

Both canonical and alternative splicing of RNAs are governed by intronic sequence elements and produce transient lariat structures fastened by branch points within introns. To map precisely the location of branch points on a genomic scale, we developed LaSSO (Lariat Sequence Site Origin), a data-driven algorithm which utilizes RNA-seq data. Using fission yeast cells lacking the debranching enzy...

متن کامل

GIIRA - RNA-Seq driven gene finding incorporating ambiguous reads

MOTIVATION The reliable identification of genes is a major challenge in genome research, as further analysis depends on the correctness of this initial step. With high-throughput RNA-Seq data reflecting currently expressed genes, a particularly meaningful source of information has become commonly available for gene finding. However, practical application in automated gene identification is stil...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014